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APPROXIMATING THE LEADING SINGULAR TRIPLETS OF A 
LARGE MATRIX FUNCTION* 

SARAH W. GAAFt AND VALERIA SIMONCINI* 

Abstract. Given a large square matrix A and a sufficiently regular function / so that f(A) is 
well defined, we are interested in the approximation of the leading singular values and corresponding 
singular vectors of f(A), and in particular of ||/(A)||, where || • || is the matrix norm induced by 
the Euclidean vector norm. Since neither f(A) nor f(A)v can be computed exactly, we introduce 
and analyze an inexact Golub-Kahan-Lanczos bidiagonalization procedure, where the inexactness 
is related to the inaccuracy of the operations f(A)v , f(A)*v. Particular outer and inner stopping 
criteria are devised so as to cope with the lack of a true residual. Numerical experiments with the 
new algorithm on typical application problems are reported. 


1. Introduction. Given a large nxn complex matrix A and a sufficiently regular 
function / so that f(A) is well defined, we are interested in approximating the largest 
singular values and corresponding singular vectors of the matrix function f(A). This 
computation will also give an approximation to its 2-norm, namely, ||/(A)||, where 
|| • || is the matrix norm induced by the Euclidean vector norm, and it is defined as 
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In our presentation we will chiefly discuss this norm approximation because of its 
interest in applications. However, we shall keep in mind that the considered procedure 
allows us to also determine both associated left and right singular vectors, and that 
a group of singular triplets can be determined simultaneously. 

The problem of approximating the norm of a matrix function arises in the so¬ 
lution of stiff linear initial value problems |13j.|30|. in the evaluation of derivatives 
and perturbations of matrix functions, which arise for instance in electronic structure 
theory mmm, and in monitoring the magnitude of the inverse of distance ma¬ 
trices [T] . In numerical linear algebra the norm of matrix polynomials may be used in 
the analysis of iterative procedures, and the norm of rational matrix functions, and 
in particular of the transfer function may give information on the sensitivity of the 
matrix itself to perturbations; see, e.g., [34] .[5] and their references. 

If A were normal, then the approximation could be stated in terms of an eigen¬ 
value problem in A. Indeed, if A = QAQ* is the eigendecomposition of A with Q 
unitary and A diagonal, then f(A) = Qf(A)Q* [22] . so that the leading singular val¬ 
ues of f(A) could be determined by a procedure that approximates the eigenvalues of 
A. 

The problem is significantly more challenging if A is large and non-normal, since 
there is no relation between eigenvalues and singular values that can be readily ex¬ 
ploited during the computation. Moreover, although A may be sparse, in general f(A) 
will be dense, and it cannot be computed explicitly. We are thus left with procedures 
that use / and A by means of the action of f{A) to a vector v. The Lanczos bidi¬ 
agonalization is among the most used strategies for approximating selected singular 
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triplets of a given matrix. Given a matrix F : this procedure generates a sequence 
of orthonormal vectors {v 1: v 2 ,...} and {u 1: u 2 , ...} by alternating products of Fv 
and F* u. In our case, F = f(A) and therefore these matrix vector products can 
be approximately computed. In fact, since this computation is expensive, we shall 
consider an inexact implementation of the Lanczos bidiagonalization process, where 
at each iteration the action of f(A)v and f(A)*u is approximated with some loose 
tolerance by means of a projection method. The problem of approximating f(A)v 
has seen a great interest growth in the past fifteen years, due to the emerging occur¬ 
rence of this computation in many scientific and engineering applications; see, e.g., 
mmmmmm .[20:. and their references. For our purposes we shall use Krylov 
subspace methods for approximating f(A)* u and f(A)v at each iteration, equipped 
with a cheap stopping criterion that may also be adapted to the outer current accu¬ 
racy. We shall show that the inexactness in the Lanczos bidiagonalization causes the 
loss of the known symmetry structure of the process. Nonetheless, as is the case in 
finite precision analysis [26j, orthogonality of the basis can be preserved, so that the 
recurrence maintains its effectiveness. 

If a rough approximation to ||/(A)|| is the only quantity of interest, instead of 
a group of singular triplets, then other approaches could be considered. For instance, 
/ could be approximated by some other more convenient functions, and then the 
resulting matrix function norm could be more easily estimated. As an alternative, 
equivalent definitions of f(A) could be used, from which the norm could also be 
estimated; or, the relation of ||/(A)|| with other norms or with some other spectral 
tool could be used; some of these approaches are briefly recalled in section[2] Methods 
in the mentioned classes, however, usually at most provide the order of magnitude of 
the actual norm and are thus inappropriate if more correct digits are needed. 

This paper is organized as follows. Section [2] reviews some methods available for 
the approximation of ||/(A)||. In section [3] the standard Lanczos bidiagonalization is 
recalled and the general notation used in this paper is introduced. Section |4] presents 
the inexact Lanczos bidiagonalization procedure, including the details on the stopping 
criteria in section run Section [5] discusses the approximation of the matrix function 
multiplication, and a stopping criterion for its accuracy, while in section [6] a stopping 
criterion in case an inner flexible strategy is analyzed. In section 16.11 we show how 
specific spectral properties allow us to make a variable accuracy for the inner iteration 
feasible, which is finalized in section [6T2l Section [7] focuses on the practical implemen¬ 
tation and the numerical results are presented in section [5] We will conclude with 
some discussion in section [9] 

The following notation will be used throughout. The vector indicates the ith 
column of the identity matrix of a given dimension. The conjugate transpose of a 
matrix A will be denoted by A*. We will use the Matlab-like notation [x; y] to denote 
the column vector 

x , xer',yeC"». 

A. 

The Euclidean vector norm for vectors will be used, namely ||x|| = (]C”=i \ x i\ 2 )A 
for x £ C ra . Unless explicitly stated, the induced matrix norm mi) will be used for 
matrices. For A £ C nxn , spec(A) denotes the set of its eigenvalues, and W{A) = {z £ 
C : z = (x*Ax)/(x*x), x £ C"} is its field of values. 

2. Available techniques for estimating the norm. While the Lanczos bidi¬ 
agonalization is widely recognized as the method of choice for approximating selected 
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singular triplets of a large matrix, if one is only interested in estimates of ||/(A)|| 2 
with A non-Hermitian, then rather different procedures could also be used. A simple 
approach consists of roughly estimating || • || 2 by using some other matrix norm. For 
instance, 


||/(A)||2<^||/(A)|| p , p = 1, oo, 

where ||/(A)|| P is once again an induced norm [23] p. 365]. This bound is usually 
pessimistic, and it is clearly unsatisfactory for n large. The fact that for A large the 
entries of /(A) are not all readily available provides an additional challenge. 

In the following we describe a few approaches available in the literature that are 
tailored to the matrix function case. Some of them first determine an explicit upper 
bound for the norm, which only depends on scalar quantities. The core computation 
will then be to determine a good approximation to the obtained upper bound. The 
quality of the final estimate of ||/(A)|| 2 will thus depend both on the sharpness of 
the initial upper bound and on the accuracy of the computation. For general non¬ 
normal matrices the initial bound is often not very sharp, limiting the quality of the 
overall estimation. Finally, a computation-oriented estimation is the power method, 
which directly approximates ||/(A)|| 2 as the square root of the largest eigenvalue of 
/(A)*/(A). A more detailed list follows. 

1. Let r(A) be the numerical radius of A, that is r(A) = max{|z| : z £ W(A)}, 
where W(A) is the field of values of A. Since (see, e.g., [T5] Theorem 1.3-1]) 

r(A)<||A|| 2 <2r(A), 

by applying the bounds to f(A) instead of A, it is possible to estimate ||/(A)|| 2 
by means of r(/(A)); see, e.g., [35],[2H] for numerical methods to compute 
the numerical radius of a given matrix. A related special case is given by the 
exponential function, for which the bound 

II exp(A)|| < exp(a) (2.1) 

holds, where a is the largest eigenvalue of the Hermitian part of A, that is of 
i(A +A*) [T3] section 10.1]. 

2. If it is possible to find K > 0 and fl C C such that 

||/(A)|| 2 <A'||/||o, 

then it is sufficient to estimate ||/||n; here ||/||n is the Loo-norm of / on 
fl. This can be done for instance when / is a polynomial, [6J, for which 
I\ is known to be less than 11.08 and conjectured to be equal to 2, and 
fl coincides with the field of values of A. We refer to the Ph.D. thesis of 
D. Choi [5], for a discussion on the use if this bound when A is normal, or 
when A is a contraction; see also [35 for a detailed analysis of this bound 
when using pseudospectral information. The computational intensive task is 
given by the determination of fl. If fl coincides with W{A), then the cost 
of accurately approximating fl may be superior to that of approximating the 
single quantity ||/(A)|| 2 . 

3. This approach is in the same spirit as the one above, but with a higher 
computational component. For e > 0, let cr e (A) ={z£ spec(A + E) : ||A|| < 
s} and assume that / is analytic in cr e (A). If L e denotes the length of the 
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boundary da e (A) = {z £ C : ||(zl — A) 1 || = e 1 }, then by using the Cauchy 
integral expression for /(A) we obtain (see, e.g., (54] ) 

11 / 04)11 < ^- £ \\fba e - 

Although the involved quantities may be easier to compute than in the pre¬ 
vious case, the dependence on e > 0 remains not fully controllable. 

4. Using the relation ||/(A)||| = Amax(/(A)*/(A)) a run of a few iterations of 
the power method can give an estimate to Amax(/(A)*/(A)); see, e.g., [19l 
Algorithm 3.19] for an algorithm specifically designed for the largest singular 
triplet. 

The power method is probably the most appealing approach among the ones 
listed above. If a rough approximation is required, typically to determine the order 
of magnitude, then its low cost provides a satisfactory answer. However, if more than 
one digit of accuracy is required, then the process may become slow. As for with A, 
the stability in the computation may be highly influenced by the squaring; we refer 
to section [8] for an example of this well known phenomenon. 

3. Lanczos bidiagonalization. We start by recalling the Golub-Kahan bidiag- 
onalization process in our context, in terms of the matrix function /(A); then we will 
discuss how to actually obtain /(A) times a vector. Let Uo = 0 and /?i = 0, then for 
i = 1,..., m the following recurrence relations define the Lanczos algorithm 

feu i = f(A)wi — fe_iUj_i . 

fe+iv i+ i = f(A)*m - fevj. 1 

The coefficients fa are computed so that the corresponding vectors v’s and u’s have 
unit norm. By collecting the two sets of vectors as U m = [ui,..., u m ], V m = 
[vi,...,v m ], we observe that UfaU m = /, VfaV m = /, V^v m+1 = 0. Moreover, 
the two recurrences can be compactly written as 

f(A)V m = U m B m 

f(A)*U rn =V m B*r n +i32m+lVm+l e *mi (5-2) 


where B m is the following bidiagonal matrix 
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m 


fa fa 
fa fa 


e R 


mxm 


It can be shown that the columns of V m span the Krylov subspace /C m (/(A)*/(A), Vi) 
and the columns of U m span the Krylov subspace /C m (/(A)/(A)*, /(A)vi). Define 


and 
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2m — 
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L U rr, 


B m 
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W 2m 



0 

Vm 


o /(A) 
/(A)* 0 
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Then the recursion (13.21) can be rewritten as a standard Lanczos process, in the more 
compact matrix notation m p. 178-186], QH p. 448-449, p. 495]) 


J r W2m — Wfefem + p2m+l 


0 

v m+ l 


(3-3) 


For both even and odd j, the eigenvalues of Bj occur in ± pairs, with the exception 
of an extra extraneous zero eigenvalue in the odd case. Within this setting, is it thus 
possible to approximate the singular values of /(A) by the positive eigenvalues of fe m , 
or, equivalently, by the singular values of B m . In particular, for the largest singular 
value it holds that (see [221 Corollary 3.1.3, Lemma 3.3.1.]): 


cti(Bj_i) < efefe) < cri(/(A)), 2 <j<m. 


There are several advantages of the Golub-Kahan bidiagonalization over the simpler 
power method applied to /(A)*/(A), which are mainly related to the fact that the 
eigenvalue squaring in this latter problem may lead to severe loss of information 
in the case very small or very large singular values arise. In the inexact case the 
bidiagonal formulation also allows us to better trace the inexactness during the whole 
approximation process; this is discussed in the next section. 

4. Inexact Lanczos bidiagonalization. When neither the explicit computa¬ 
tion of the matrix /(A) nor the accurate operation /(A)v (or /(A)*v) are feasible, 
then approximate computations must be performed, resulting in an inexact Lanc¬ 
zos bidiagonalization procedure. As a consequence, the recurrence (13.21) needs to be 
significantly revised so as to aknowledge for the quantities that are actually computed. 

For a given v, the exact matrix-vector multiplication /(A)v has to be replaced 
by an inner procedure that approximates the resulting vector up to a certain accuracy. 
The same holds for the operation /(A)*u for a given vector u. For the sake of the 
analysis, at each iteration k we shall formalize this difference by writing, for some 
matrices C t and Di, 


few = {f(A)vi + CiVi ) - fe_iUj—i 
fe + iv i+ i = (f(A)*Ui + Dim ) - fevj, 


where C), Di implicitly represent the perturbation induced by the approximate compu¬ 
tations. Since in general f(A)* + Di is no longer the conjugate transpose of /(A) + Ci , 
orthogonality of a new vector v m+ i has to be enforced by explicit orthogonalization 
with respect to all previous vectors v,. 1 < i < m. The same holds for the vectors w, 
i = 1,..., m. Therefore, instead of one bidiagonal matrix B m in the exact relation, we 
now obtain an upper triangular matrix M m and an upper Hessenberg matrix T m . This 
leads to the following relations for the inexact (perturbed) Lanczos bidiagonalization: 


(/(A) + <L m )Vm = U m M m 

(/(A) T 2) m )t/ m = V-rriAm A lm+l,mVm+ Wm, 

where € m = Y^k=i an d = SfcLi Ac u fc u fc- The matrices V m and U m 

are different from the matrices in the exact relation, but they still have orthonormal 
columns. 

The inexact Lanczos bidiagonalization can also be described using the notation 
of (13.31) . Define 


fern — 
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T m 


M n 
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W 2 m 
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0 

V m 
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and the perturbation matrix 


0 £rn 

o 


Q2m — 

The perturbed relation thus becomes 

•HW 2 m T ^2m = b^2m^2m T tm +1 

where 

BY^2m + $2 m = + Sm)'^2m = 


y^2m — : £mW2m- 


0 

v m+ i 


i)2 m 


(4.1) 


0 f(A) + U 
f(A)* + T) m 0 


y^2m — : ‘^2mbV2m- 


In contrast to the exact case, the space spanned by the columns of W^m is not a Krylov 
subspace. However, when is small, this new space is close to an invariant 

subspace of the perturbed matrix because then J r 2 m H , 2 m ~ W 2 mB 2 m ■ Notice the 
similarity of (14.11) with equation (3.1) in [31] , which shows that with this formulation, 
the inexact projection problem amounts to solving a structured eigenvalue problem, 
where the original Hermitian matrix T has been perturbed by a structured non- 
Hermitian perturbation £ m . The theory in m can then be used to analyze and 
monitor the inexact computations, although the general results in ]31j should be 
carefully adapted to the new problem structure. 

If £ m is small in norm, the eigenvalues of the non-Hermitian matrix T 2 m are 
small perturbations of the eigenvalues of the Hermitian matrix J-. Indeed, the eigen¬ 
values of the perturbed matrix J~ 2 rn lie in discs with radius ||£ m || and center the 
(real) eigenvalues of T (see, e.g., [331 section IV, Theorem 5.1]). Therefore, for small 
perturbations in the computations, the eigenvalues of the symmetric matrix T will 
be perturbed accordingly. On the other hand, in the following we shall consider the 
case when ||£ m || is larger than usually allowed by a perturbation analysis argument, 
therefore different strategies need to be devised to ensure good approximations to the 
wanted eigenvalues of T. 

^Following the standard procedure of the exact case, we should consider the ma¬ 
trix Bim to approximate the largest eigenpairs of B‘im^ and according to the discussion 
above, of T . Due to the non-Hermitian structure of B 2 m, however, there are different 
matrices that can provide the sought after singular value information, namely the 
matrix B 2 m itself, and the two distinct matrices T m or M m . The last two matrices 
yield approximations to the corresponding triplets of f(A) + C m and f{A)* +® m . The 
following bound between the largest eigenvalue of T and the largest singular values 
of T m and M m shows that all these quantities can be easily related. Let q = [x;y]. 
Using ||x||||y|| < i (||x|| * 2 + ||y|| 2 ) we obtanfl 


< max 

q*#2 m q 

= max 

q/o 

q*q 

[x;y]#0 


^ lvJ -n 


< max ■ 

[x;y]^0 


4\\ M my 


tt + max 

2 [x;y]/0 


x*x + y*y 

l|y|||l T mx|| 


— 2 + ai(T m )). 


If the inexactness of the bidiagonalization is very large, M m and are very different 
from each other. In this case, the leading singular values of these two matrices - and 


1 Another bound can be obtained for the geometric mean, that is \()\ < \Jo\ (M m )a 1 (T m ). 
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thus their mean - may be significantly larger than the biggest (in modulo) eigenvalue 
of ^ 2 m) since they are related to the numerical radius of Z? 2 m> rather than to its 
spectrum. This motivated us to use the eigenvalues of B^m in the approximation, 
rather than the singular values of its blocks. Moreover, working with £> 2m made the 
analysis of the relaxed strategy particularly convenient, since known results on relaxed 
eigenvalue computation could be exploited. 

4.1. A computable stopping criterion. In this section we analyze a strategy 
for monitoring the convergence of the inexact bidiagonal iteration. As it is common 
to other inexact processes, the true problem residual is inaccessible as soon as inex¬ 
actness takes place. However, on the one hand some stopping criterion needs to be 
introduced to exit the process. On the other hand, it is unclear whether the com¬ 
puted approximations are still meaningful for the original problem, since they were 
computed with significantly modified data. 

Let (6 , q) be an eigenpair of B 2 m, where q is a unit vector. As the iterations 
proceed, {6, W 2 m q) tends to approximate an eigenpair of T. We would like to ensure 
that (0, W 2 m q) also tends to an eigenpair of T. To monitor the convergence of 9 and 
to define a stopping criterion for the outer iteration, the residual is used. We call 
J-W 2m q- 9W-2 m q the true residual , which is not available, since T cannot be applied 
exactly. We thus introduce the computed residual , which is the residual of the actually 
computed quantities, namely (see (14.11) 1 


1*2 m '•— -FW 2m q (W 2m q — tm+l,m 

In the sequel, we shall use the following obvious inequality to estimate the true residual 
norm: 


0 

Vm+l 


elq, (e m G K 2m ). 


||^W2mq-0W 2m q|| < ||r 2m || + ||(^ r W 2 mq — W 2m q) — r 2m ||, (4.2) 

where || (FTV^mq — 0 W 2 mq) — r 2m || is the gap between the computed and the true 
residuals, in short the “residual gap”. If this gap can be imposed to be small, then the 
computed residual will give an estimate for the true residual. In this case, convergence 
can be monitored by only using the (available) computed residual, and the following 
relative stopping criterion can be used: 

if -—- < E 0 ut then stop (4.3) 

\ 9 \ 

for some outer tolerance e ou ti where 8 is the largest (in modulo) eigenvalue. Finally, 
as the computed residual norm goes to zero, the quantity || (^ r W 2m q — dW 2 TO q) — r 2m || 
will tend to dominate again, playing the role of the final attainable accuracy level. 

To see how we can impose the residual gap to be small, and recalling the defini¬ 
tion of Q' 2 m , we first consider a more convenient expression for femq, with q = [x; y], 
that is 


f?2mq — 




Gr m y 

^mUrriX- 


1 

i_ 
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Let Gm = [gi’\ • • ■, gm ]■ Then 


||(-PW 2m q - ew 2 m q) - r 2m || = ||0 2m q|| 


fW V 

y 

kJm X 



' ( 1 ) * ' 

g, e*y 

Sj 2)e j x 


m 


^E 


■ (i) * ■ 

g} J e,y 
( 2 ) * 
«j e j x 


m 


Edi^in^i'-Hi^n 2 

j=i 



(4.4) 


The vectors g j l \ i = 1,2, implicitly carry the error caused by the inexact computation 
of /(A)v and f(A)*u, respectively, in the inner iteration. If every term of this sum is 
small, the computed residual will be close to the true residual. The following Lemma 
states how the inaccuracy in the matrix-vector products relates to the residual gap; its 
proof is based on the corresponding result in |31j , however the structure is exploited 
so as to have a dependence with respect to m instead of 2m, the size of £> 2m . 

Lemma 4.1. Assume that m iterations of the inexact Lanczos bidiagonalization 
process have been taken. 

If ||gj- 1} ||, ||gj 2) || < for 1 <j<m, then ||(JW 2m q - 0W 2ro q) - r 2m || < e. 

Proof. From ||q|| = 1 with q = [x; y] it follows that ||[e*x;e*y]|| < 1. From 
(TOl) we obtain 


||(J r W 2m q — 0W 2m q) - r 2m || < Edlgf^ 2 l e ^| 2 + llgf’ll 2 l e j x | 2 )' 


3 =i 


< E -dlejyl 2 + NM 2 ) 1 < 1 = ^ D 


3=1 


3=1 


This result shows that if e is sufficiently small, then the residual gap will stay 
below the computed residual norm until convergence. In our experiments, m will play 
the role of the maximum number of Lanczos bidiagonalization iterations, which is 
usually set to a number between 50 and 500. 

5. Approximation of /(A)v and f(A)*u and a computable inner stopping 
criterion. The performance of the inexact Lanczos bidiagonalization process depends 
on the approximation accuracy of the matrix-vector products f{A)v and f(A)* u. Due 
to the size of A 1 we consider approximating this quantity by means of a projection-type 
iterative method as follows; we limit our discussion to /(A)v, and a corresponding 
procedure can be used for f(A)* u. Starting with the unit vector v and the matrix 
A, we construct a sequence of approximation subspaces KLk of R n , k = 1,2,..., and 
define the matrix Pk = [pi, p 2 , P3, ..., p*] € C nxfe , whose orthonormal columns span 
the subspace, and v = pi = P^ei, in a way so that the spaces are nested, that is 
K-k Q K-k+i ■ Typical such choices are Krylov and rational Krylov subspaces [19j , H2. 
The desired approximation is then obtained as 


f{A)v « P k f{H k )e i, H k = PfAPk. 

For small fc, the reduced non-Hermitian matrix Hk has small size, so that f(Hk) can 
be computed efficiently by decomposition-type methods [T5I . 

Our stopping criterion of this approximation process is based on an estimation of 
the error norm, and it uses an approach previously introduced in [25J Proposition 2.2]. 
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Proposition 5.1. [25] Proposition 2.2] Assume that k + j inner iterations have 
been executed. Let "Zk+j = Pk+jf{Hk+j) e l be an approximation to /(A)v and define 
Wfe+j = ||z k+j -Zfc||/||zfe||- If \\f(A)v - z k+j \\ < ||/(A)v-z fc || and ||/(A)v|| « ||z fe ||, 
then 

ll/(^) v — Zfc|| « T ~~~— IIZfc||- (5.1) 

1 CUh-\-j 

The result in (15.11) shows that after k + j iterations it is possible to provide 
an estimate of the error norm at iteration k. Therefore, we introduce the following 
stopping criterion for the approximation of /(A)v: 

if — —— — < £i n then stop 
1 ^k+j 

for some inner tolerance £j„. The accuracy of the inner iteration will influence the final 
accuracy of the inexact Lanczos bidiagonalization. In the notation of the previous 
section, once the stopping criterion is satisfied, we have thus derived the following 
estimate for the perturbation occurring in the Lanczos step, 

||/(A)Vfc - Ufc|| = || C'fcVfc || « £m||Ufc||; 

Note that here ||C'fe v/- 1| = ||g[ 1 '||, with the notation in (14.41) . An analogous relation 
holds with respect to f(A)*\ik and thus ||g[ 2) ||. We stress here that, since the ap¬ 
proximation process changes at each iteration, the quantity ||CfcVfc|| will vary as the 
Lanczos bidiagonalization proceeds. The threshold itself may vary during the Lanczos 

iteration, so that £j ra = e} n J . As experienced with other eigenvalue and linear system 
(k) 

problems, e in may even be allowed to grow during the iteration, without significantly 
affecting the overall process. This is discussed in the next section. 

6. Relaxing the inner solution accuracy. The bound in (14.41) on the residual 
gap suggests that the accuracy on the inner solution approximation can be relaxed as 
convergence takes place. Indeed, following similar strategies in mm®, we observe 
that it is the product ||g^|| |e*y| in (14.41) that needs to be small, and not each factor, 
to ensure a small gap; the same for ||g( 2 ^|| |e*x|. Therefore, if |e*y| is sufficiently 

small, indicating that the (m + j) th component of the eigenvector q is small, ||g^|| is 
allowed to be larger, and the required accuracy of £ can still be achieved. This induces 
a variable (possibly growing) accuracy in the inner iteration, which drives the size of 
|| gj 1 ||. In the following we shall first show that the quantities |e*y| and |e*x| do tend 
to decrease as the approximation improves. We then derive a computable expression 
for the variable stopping tolerance in the approximation of f(A)v and f(A)* u at each 
iteration of the resulting “relaxed” Lanczos bidiagonalization process. This strategy 
may be convenient in case the cost approximating /(A)v and f(A)* u is very high, as 
is the case for instance if an accurate approximation to the leading singular triplets 
is requested. 

6.1. Spectral properties of the approximate singular triplets. To ensure 
that the magnitude of ||gj fc ^||, k = 1,2, can be relaxed in the bound (14.41) . we need 
to verify that |e*x| and |e*y| become small as convergence takes place. This fact has 
been verified in the eigenvalue setting in m, however the peculiar structure of the 
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Lanczos bidiagonal recurrence requires the validation of the results in [31) . To this 
end, we first define the submatrix of B 2m of size 2k as 


B 2k 


0 M k 
T k 0 1 


where M k , T k are the leading portions of the corresponding m x m matrices. Let 
ql 2fc )) be an eigenpair of B 2k , where q^ 2k ) = [x; y] has unit norm, and x, y £ C k . 
Further, let 


x 


q = 


o 

y 

o 


( 6 . 1 ) 


where the 0-vectors have length m — k, and define X = [q, Y], where Y is chosen such 
that X is unitary. Define B 2m = Y*B 2m Y € £( 2 m-i)x( 2 m-i)_ ^he following result 
shows that under certain hypotheses some of the components of the approximate 
eigenvectors do tend to zero as convergence takes place. Its proof is analogous to that 
of [52 Prop. 2.2], and can be found in the appendix. 

Proposition 6.1. Let (#( 2fc ),q( 2fc )) be an eigenpair of B 2k , and q be as defined 


in (16.11) . 

tk+l,k 


Let s 

0 

Vfe+l 


2 m 


e fe q 


= q *B 2m - 0( 2fc )q* 

(2k) jj 


S 2 m, 2 k = amin{B 2m ~ 6» (2fc) /) > 0, and r 2k = 


ll r 2fc|| < 


u 2m,2k 

4irai’ 


then there exists a unit norm eigenvector q = [xi; X 2 ; yi; y 2 ] of B 2m with xi, yi S C fc , 
x 2 ,y 2 S C m ~ k , such that 

T 

\/l + T 2 ’ 

with t £ K, 0 < r < 2 ^ . Moreover, if 8 is the eigenvalue associated with q, we 

have 


x 2 

Y2 


\e-0^\< ||s 2m ||r. 


( 6 . 2 ) 


This proposition states that if after k < m iterations of Lanczos bidiagonaliza- 
tion the computed residual ||r 2 fc|| is sufficiently small, then there exists an eigenvector 
of B 2 m such that some of its components are bounded correspondingly. These are 
precisely the components that allow us to relax the accuracy in the inner iteration. 
Note that 5 2rn , 2k gives an indication of the distance between the spectrum of B_ 2 m an d 
f?( 2fc ). It should be kept in mind that for non-normal matrices, the value of S 2mt2k 
may be much smaller [33] Example 2.4, p. 234]. On the other hand, since B 2m is 
a perturbation to a Hermitian matrix, the quantity s 2m is an approximate residual 
for (8^ 2k \ qOO) as an eigenpair of B 2m , and thus it will be small as m grows. As a 
consequence, the condition in the theorem is likely to be satisfied, and the eigenvalue 
error (IQ) may be much smaller than r. 
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6.2. Variable accuracy in the inner approximation. In this section we 
show that relaxation in the inner accuracy at step k < m is possible if there exists an 
eigenpair (0 2 ( fc_1 ), q 2 ( fe_1 )) of Z? 2 (fc- 1 ) such that 


ll r 2(fc-l)|l < 


x2 

u 2m,2(fc-l) 

4||s 2m || ’ 


V0i e A Oj ^ 0 , 


|q _ 02{k—X) I > 0 ll S 2m|lll r 2(fc-l)ll 

$2m,2(k-l) 


(6.3) 

(6.4) 


The first condition m ensures that there exists an eigenvector q of B 2m whose 
specified components are small, according to Proposition 16. II Let 8 be the eigenvalue 
associated with this q. The second condition, m, guarantees that the eigenvalue 
0 2 ( fc -D of B 2{k _!) is a perturbation of the eigenvalue 8 of B 2m , which is the final 
approximation to the original problem. The following theorem states how the use of 
a variable accuracy will still guarantee a small residual gap, and hence yields a true 
residual with an accuracy which is bounded by the accuracy of the gap, in agreement 
with (14.21) . 

Theorem 6.2. Assume m inexact Lanczos bidiagonalization iterations are car¬ 
ried out. Let (0, q) be an eigenpair of B 2m , where 8 is simple and ||q|| = 1. Given 
0 < e out £ R, with the notation of &4-4D assume that for k = 1,... ,m 


|g£ 1} i 


|g[ 2) ii < 


<52m,2(fc-l) _ 

2 m\\r 2 ^- 1) \\ £ out 


if k > 1, and there exists (q 2 ( fc 1 ^,0 2 ( fc 1 ^) 
of B 2 (k-\) satisfying (16.31) and (16.41) . 


I —£out otherwise. 

v m ULLb 

Then ||(J r W 2m q - 6'W 2m q) - r 2m || < £ ou t- 


(6.5) 


Proof. This proof is analogous to the proof of Theorem 3.1 in m- Suppose 
that at the (k — l)th iteration there exists an eigenpair (0 2 ( fc_1 ), q 2 ( fc_1 )) of B 2 (k-i) 
satisfying the conditions (16.31) and (16.41) . This implies that 8 2( - k ^ is a perturbation 
of the considered eigenvalue 8 of B 2m , since 8 is the only eigenvalue of B 2m such that 


| Q _ ^2(fc-l)| < 2 ll S2m ll II r 2 (fc 1) II 

^2m,2(Jc-l) 


Let K, C {l,...,m} be defined such that for each k £ K, there exists a eigenpair 
(q 2 ( fe_1 ),0 2 ( fc_1 )) of £? 2 (fc-i) satisfying the conditions (16.31) and (16.41) . Then, similar 
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to the reasoning in the proof of Lemma 14.11 and using (14.41) , 

m 

IKW^q - 6>W 2m q) - r 2m || = ||£ 2m q|| < ^(IlgEl 2 l e fc y | 2 + llsEl 2 l e fc x | 2 )^ 

k= 1 

< 5Z^ g fc 1) ll 2 l e fe y l 2 + llgfc 2) H 2 l e fc x l 2 )^ + E dl g i 1) ll 2 l e fe y l 2 + llgi 2) ll 2 l e fe x l 2 )^ 


fee/c 


k0 c. 

fe<m 


< E l^yl 2 + W*l¥ + E — (l^yl 2 + |e;*| ! )i 

^ 2m||r 2(fe _ 1) || m 


fe^/C, 

k<.m 


^E 


fee/c 


^2m,2(k— l)£out ^ ||l" 2 (fc—1)|| £ 0 ut 

2m||r 2(fe _ 1) || <5 2 m,2(fc-i) ^ m 

k<.m 


|/C| , m- |/C| _ 

£out “t - £out — £out U 

m m 


Algorithm 1: Inexact Lanczos bidiagonalization 

Input: A E C nXn non-Hermitian, a function /, a maximum number of (outer) iterations m, an 

(outer) tolerance £ 0 itt- 

Output: An approximation to the leading singular triplet 

1: Choose vi with ||vi|| = 1, and set V = [vi], U = 0, M = 0, T = 0. 

2: for j = 1 , ,m 

3: z » /(A)vj 

4: Uj,mj •<— rgs(z,£7) 

5: Expand basis: U = [U, u^] 

6: Expand matrix: M = [M, mj] (the old M is first padded with a zero row) 

7: z»/(A)* Ui 

8: v j+ i,tj <— rgs(z, V) 

9: Expand basis: V = [V , Vj+i] 

10: Expand matrix: T = [T,tj] 

11: K = [zeros(j , j) , M ; T , zeros(j,j)] 

12: [Q,D]<—eig(K) 

13: (0,q) <— with 0 = max^ \Da\ (extract x, y from q = [x; y] with ||x|| = 1, ||y|| = 1) 

14: Convergence check: if | T(j + 1, j)<l(j)\/0 < e 0 ut then return (0,x,y) and stop 

15: If required: compute variable tolerance to be used in the next iteration 

16: end 


7. Practical implementation. Algorithm 1 implements the inexact Lanczos 
bidiagonalization to approximate ||/(A)|| 2 and the associated singular vectors. The 
function rgs(z, Z ) double orthogonalizes the vector z with respect to the orthogonal 
columns of Z , and returns the orthogonalization coefficients. The same algorithm can 
be used to approximate more singular triplets. 

At every iteration of the Lanczos bidiagonalization, two inner iterations (line 4 
and line 7) approximate the corresponding matrix-vector multiplication /(A)v and 
f(A)* u, respectively. The inner iteration uses one of the algorithms for approximating 
the action of a matrix function to a vector, as discussed in section [5j In theory, any 
such algorithm could be used; in our experiments we employed both the standard 
and extended Krylov subspace methods. In case of variable inner tolerance, the next 
inner tolerance is computed at the end of every Lanczos bidiagonalization iteration. 
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Moreover, after the second outer iteration, we require that the inner stopping criterion 
is such that 


max {||gfc 1) ||, l|gfc 2) ||} < 


^2m,2(k-l) 

2m||r 2 (fc—i) || e ° ut ' 


Note that a relative criterion is always used, that is, in practice the quantity to be 
checked is divided by the current approximation 0 2 ( fe_1 ). This corresponds to using 
£ out = E ou t for some fixed value e 0 ut- Since <^2m,2(fc-i) is not available at 

iteration k, we consider the following approximation: 

5 2 ^:= _ min \ 6 2 ^ - 0 j\. 

^6A(8 2(fc _ 1) )\{6» 2 ( fe - 1 )} 


In fact, 62 m, 2 (k- 1 ) can be much smaller than the computed 5 2 ^ k 1 i. However, it will 
not be overrated much when is converging^ to the corresponding eigenvalue 

6 of B 2 m, since it is related to the sensitivity of 132 m and not of the matrix J-. If 
the d 2 ^" 1 ) is very small, it constrains the inner accuracy to be very small too. This 
occurs when the largest eigenvalues of B 2 m are clustered. 


8. Numerical experiments. In this section we report on our numerical ex¬ 
periments to evaluate the performance of the inexact Lanczos bidiagonalization for 
different combinations of matrices and functions. All experiments were performed 
with Matlab Version 7.13.0.564 (R2011b) on a Dell Latitude laptop running Ubuntu 
14.04 with 4 CPUs at 2.10GHz. We are mainly interested in the first singular triplet 
of /(A), so as to obtain ||/(A)||. We considered five different matrices, summarized 
in Table l8Tl all of dimension n = 10,000 except A 4 . The spectrum of a sample of 
these matrices of smaller size, n = 1000, is reported in Figure fSTTl For A 4 , the matrix 
originating from a cavity driven problem was shifted by 10 / so that all functions could 
be treated with all methods; we refer to the Matrix Market site for more information 
on this problem m- For A 5 a 5-point stencil finite difference approximation was 
used, together with homogeneous Dirichlet boundary conditions. We considered the 
following functions, 

exp(x), exp(-x), y/x, ~^=, ex P( v^) — l 

yjx X 

We note that all these functions allow for an efficient computation when applied to 
small scale matrices, by means of specifically designed Matlab functions; see jl9] . 
The performance also critically depends on the choice of the inner method for ap¬ 
proximating /(A)v and /(A)*u at each iteration. We shall report our experience 
with the standard and extended Krylov methods. The Rational Krylov method could 
also be employed for this approximation. 

A random vector is used to start the inexact Lanczos process. Convergence is 
monitored by checking the computed residual norm with respect to the first singu¬ 
lar triplet, and the inexact Lanczos bidiagonalization terminates as soon as CEO) is 
satisfied; different values for e out will be considered. In case more than one triplet 
is desired, then all corresponding residuals should be monitored. In section 18.11 we 
explore the fixed inner tolerance method, and the dependence of its performance on 
all the other parameters, including the outer accuracy. Indeed, if only a rough ap¬ 
proximation of ||/(A)|| is required, the computational efforts should proportionally 
low. In section 18.41 the influence of the variable (relaxed) inner tolerance described 
in section T 6.21 is analyzed, thus a more stringent final accuracy is considered so as to 
exercise the variable inner threshold. 
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Matrix 

Structure 

Description 

A\ 

tridiag(0, A i, 0.3) 

= I 1 + p\ 1) ) +*<W (2) -0.5) 

^2 

tridiag(1.5, 2, — 1) 


A3 

Toeplitz 

i-th row: [4,0, 0, 0,0, —2, 0,10, 0,0, 0, 6 ] 

A 4 

shifted E20R1000 

Driven cavity problem (Matrix Market) shifted as: 

A := A e 20riooo + 107 

A 

Sparse 

Centered Finite Difference discretization of 

C(u) = — V 2 u — 100it x — lOOrty 


Table 8.1: Description of the selected matrices, all of size n = 10,000, except A 4 , of 
size 4241. p'f 1 is a random entry taken from a uniform distribution in (0,1). 



Fig. 8.1: Spectrum of matrices A, A 2 , A 3 , A 5 (from left to right) in Table [H~il for a 
smaller size, n = 1000 . 


8.1. Assessing the effectiveness of the inexact bidiagonalization. We 

analyze the performance of the inexact method when approximating ||/(A)|| together 

with the associated singular vectors. To this end, we need to monitor the number of 

iterations of both the outer and the two inner iterations, together with the execution 

time required to reach the required tolerance. In particular, we show both the total 

and average number of inner iterations. We also display the distance between the 

final first approximate singular value, ay and the second approximate singular value, 

a 2 : a small relative distance implies that the method will take more iterations to 

converge. Moreover, this distance cannot be easily predicted from the matrix A, 

although it significantly influences the computation. For instance, the largest (in 

1 

modulo) eigenvalues of T associated with the matrix function are: 

-1.7965100, 1.7965100, 1.7964169, -1.7964169, 1.7962429, -1.7962424 . 

Although this fact does not constitute a difficulty if just the order of magnitude of 
1 

|| II is saught, it indicates that requiring a more accurate approximation will lead 
to significantly more expensive computations. This problem can be readily observed 
by comparing the outer number of iterations in Table 18.21 and Table 18.31 where we 
report the results of our experiments for e out = 10” 2 and e ou t = 10 -4 , respectively. 
In both cases, the inner tolerance was set to £j n = £ 0 Mt/(m max ), where m max = 1000, 
so that £i n = 10“' for the more stringent outer tolerance. For all examples, the first 
six significant digits of ay are reported. 

Comparing the two tables also shows that the singular values are as accurate 
as the outer tolerance can predict: for smaller e out already the third singular value 
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matr 

function 

/ 

VI 

O-f—£72 

tot # 

outer 

tot # 

inner 

average 

# inner 

exec 

time 

Ai 

exp(— x) 

0.463506 

3.34e-02 

14 

308 

11.0 

0.39 


yfx 

1.50143 

1.29e-02 

15 

351 

11.7 

0.44 


exp( — y/x) — 1 

0.727351 

2.11e-02 

14 

414 

14.8 

0.70 


exp (a;) 

9.19137 

1.76e-02 

16 

352 

11.0 

0.59 


l/y/x 

1.10350 

2.33e-02 

13 

364 

14.0 

0.73 

A2 

exp(—#) 

0.222621 

1.78e-02 

11 

308 

14.0 

0.24 


yfx 

1.7921 

2.05e-02 

8 

252 

15.8 

0.28 


exp( — y/x) — 1 

0.469829 

1.81e-02 

10 

429 

21.4 

0.68 


exp (a:) 

12.1783 

9.01e-02 

5 

139 

13.9 

0.14 


l/y/x 

0.814356 

2.26e-02 

8 

324 

20.2 

0.44 

A 3 

exp(— x) 

0.508086 

1.48e-02 

13 

709 

27.3 

0.61 


y/x 

4.56086 

1.85e-02 

12 

1036 

43.2 

3.11 


exp( — y/x) — 1 

0.615673 

1.84e-02 

12 

1953 

81.4 

18.29 


exp (a;) 

6.75709-10 8 

1.45e-02 

13 

694 

26.7 

0.83 


l/y/x 

0.959018 

1.70e-02 

12 

1852 

77.2 

14.56 

A4 

exp(—a:) 

0.000172183 

2.42e-01 

6 

456 

38.0 

0.58 


yfx 

6.09177 

2.78e-02 

14 

454 

16.2 

0.81 


exp( — yfx) — 1 

0.118301 

3.07e-02 

10 

486 

24.3 

1.15 


exp (a:) 

3.15141-10 10 

1.35e-01 

5 

397 

39.7 

0.49 


1 Isfx 

0.354039 

5.55e-02 

9 

394 

21.9 

0.81 

A s 

exp(— x) 

0.99709 

3.39e-02 

7 

223 

15.9 

0.34 


y/x 

2.81987 

1.16e-02 

16 

5145 

160.8 

193.35 


exp( — yfx) — 1 

6.93384 

2.44e-01 

4 

1570 

196.2 

111.67 


exp (a:) 

2959.17 

1.84e-02 

14 

450 

16.1 

0.65 


l/y/x 

7.36692 

2.31e-01 

4 

1564 

195.5 

112.22 


Table 8.2: Inexact Lanczos bidiagonalization for approximating the leading singular 
triplet of f(A); outer tolerance e = 10 -2 . 


digit changes, that is it still has to reach its final (exact) value. This is obviously 
also related to the relative distance from the second singular value, which is better 
captured for a smaller e ou t- 

We also observe that the choice of / strongly influences the overall performance: 
the bidiagonalization process may take the same number of (outer) iterations for two 
different selections of /, and yet the total computational cost be significantly different 
(see A\ and A 3 in Table f8T2l) . As a consequence, the number of outer iterations is not 
a realistic measure of the algorithm complexity. 

On a negative side, we observe that in both tables the method performs poorly 
on A 3 for f(x ) = y/x. For this particular matrix, the inner method takes very many 
iterations during the whole Lanczos process, with a number of inner iterations that 
is close to the average throughout. We anticipate that this is not the case for the 
power method, where as the outer iterations proceed, drastically fewer iterations are 
required in the inner approximation. This phenomenon seems to be peculiar to this 
combination of function and matrix, since in all other cases the performance of the 
Lanczos and power methods is more similar, and it will be further investigated in a 
future study. 

Finally, for the exponential functions exp(a;), exp(— x) we computed the upper 
bound in (12.111 by using the Matlab function eigs applied to ^(A + A*). In all cases 
except matrix A 4 the estimate is pretty sharp. On the other hand, for exp(A_ 4 ) the 
upper bound was 2 • 10 13 , which is three orders of magnitude larger than the actual 
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matr 

function 

/ 

&1 

CT1—CT2 

5-1 

tot # 

outer 

tot # 

inner 

average 

# inner 

exec 

time 

Ai 

exp(— x) 

0.463735 

2.04e-02 

24 

624 

13.0 

0.87 


yfx 

1.50496 

8.76e-04 

53 

1775 

16.7 

3.27 


exp( — y/x) — 1 

0.728200 

7.62e-03 

29 

1160 

20.0 

2.47 


exp(fc) 

9.19576 

9.22e-03 

32 

832 

13.0 

1.21 


1 

1.10504 

5.52e-03 

29 

1156 

19.9 

2.21 

A 2 

exp(— x) 

0.223129 

4.88e-05 

209 

7104 

17.0 

26.72 


yfx 

1.79651 

5.18e-05 

162 

8069 

24.9 

18.92 


exp( — y/x) — 1 

0.470776 

3.85e-05 

193 

12320 

31.9 

42.03 


exp(rc) 

12.1825 

8.39e-04 

47 

1596 

17.0 

1.41 


1 

\/x 

0.816492 

5.90e-05 

150 

9210 

30.7 

29.43 

A 3 

exp(— x) 

0.509010 

4.43e-05 

224 

14544 

32.5 

31.55 


y/x 

4.57175 

3.35e-05 

250 

41844 

83.7 

533.17 


exp( — y/x) — 1 

0.616989 

1.20e-04 

155 

39968 

128.9 

1078.46 


exp(rc) 

6.77296-10 8 

1.17e-04 

183 

11660 

31.9 

19.91 


1 

Vx 

0.960790 

2.12e-05 

312 

77958 

124.9 

1827.25 

A4 

exp(— x) 

0.000172195 

2.32e-01 

9 

783 

43.5 

1.26 


y/x 

6.09289 

2.38e-02 

22 

1103 

25.1 

2.25 


exp( — y/x) — 1 

0.118347 

2.44e-02 

16 

1109 

34.7 

3.16 


exp(rc) 

3.15148-10 10 

1.34e-01 

7 

626 

44.7 

0.87 


1 

\/x 

0.354473 

1.18e-02 

19 

1227 

32.3 

3.91 

A 5 

ex.p(—x) 

0.998062 

7.74e-03 

24 

911 

19.0 

1.37 


\fx 

2.82811 

1.67-04 

185 

70926 

191.7 

4059.40 


exp( — y/x) — 1 

6.93435 

2.32-01 

7 

2814 

201.0 

186.39 


exp(rc) 

2975.18 

2.91e-03 

55 

2091 

19.0 

3.20 


1 

\/x 

7.36768 

2.17-01 

7 

2814 

201.0 

197.62 


Table 8.3: Inexact Lanczos bidiagonalization for approximating the leading singular 
triplet of /(A); outer tolerance e = 10 -4 . 


norm; foi@ exp(—Aj) the upper bound was 0.004737, which is more than one order of 
magnitude larger than the actual value, 0.000172. This example illustrates that, as 
discussed in section[2] the accuracy of this type of estimate cannot be easily monitored, 
especially in the case of non-normal matrices. 

8.2. Comparisons with the power method. We wish to compare the per¬ 
formance of the new method with that of the power method, as described in section 
O Since in most cases, the leading singular values are not well isolated, we expect 
that the power method will be slow if an accurate approximation is required. There¬ 
fore, we only report results for e = 10~ 2 . Moreover, our experience is that since the 
computation is inexact, the product f(A)*(f(A)v) may give complex values, since 
the computed actions of f(A) and f{A)* are not the conjugate of each other. As a 
result, the approximate eigenvalue may be complex, though with a small imaginary 
part, and the quantity that is actually computed is given by 




( v ( fe ))*/(A)*/(A)v (fe) 


where v- fc ) is the power met hod direction after k iterations. Consequently, at conver¬ 
gence we obtain dy ss V \A). The stopping criterion is based on the relative eigenvalue 


2 This bound is obtained for exp(.4) with A = — A 4 . 
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matr 

function 

tot # 
outer 

tot # 
inner 

<5l 

residual 

norm 

exec 

time 

Ai 

exp(— x) 

51 

1071 

0.46327 

9.8648e-03 

1.5 


\Jx 

93 

2010 

1.5028 

9.8607e-03 

2.8 


exp( — y/x) — 1 

61 

1782 

0.71879 

9.8904e-03 

3.5 


exp(:r) 

65 

1409 

9.1666 

9.9964e-03 

2.0 


1 /yTx 

69 

1942 

1.0938 

9.8670e-03 

3.3 

A2 

exp(— x) 

36 

899 

0.22238 

9.8636e-03 

0.8 


yfx 

37 

979 

1.7903 

9.7995e-03 

1.1 


exp( — y/x) — 1 

36 

1216 

0.46921 

9.7553e-03 

2.1 


exp (a?) 

9 

232 

12.176 

9.4623e-03 

0.2 


l/s/x 

36 

1215 

0.81375 

9.8715e-03 

1.8 

A3 

exp(— x) 

38 

1605 

0.50724 

9.9024e-03 

1.5 


y/x 

41 

1000 

4.5564 

9.8934e-03 

1.3 


exp( — y/x) — 1 

34 

4448 

0.61486 

9.9901e-03 

39.1 


exp(ic) 

38 

1699 

6.7455-10 8 

9.7718e-03 

1.7 


l/s/x 

36 

4684 

0.95774 

9.7264e-03 

36.3 

A4 

exp(— x) 

11 

825 

0.00017219 

5.8988e-03 

1.0 


yfx 

56 

1710 

6.0870 

9.9037e-03 

3.0 


exp( — y/x) — 1 

28 

1309 

0.11823 

9.5839e-03 

3.3 


exp (a:) 

10 

775 

3.1510-10 10 

8.8031e-03 

1.1 


1/Vi 

33 

1361 

0.35405 

9.9455e-03 

2.8 

A s 

exp(— x) 

15 

376 

0.99643 

9.9168e-03 

0.52 


y/x 

52 

1479 

2.81329 

9.9649e-03 

18.47 


exp(— y/x) — 1 

7 

2745 

6.93355 

9.6566e-03 

189.19 


exp(ic) 

55 

1363 

2956.34 

9.8499e-03 

1.79 


l/y/x. 

8 

3137 

7.36721 

6.9885e-03 

202.71 


Table 8.4: Power method for approximating the leading singular triplet of /(A); outer 
tolerance e = 10 -2 


residual norm, that is 

|| y (*)_ A (k) v (*)||/ A (*) < £ouU 

where y^ is the result of the approximation of f (A)*(f (A)\r ^ k )). Note that we kept 
the same tolerance as for the Lanczos bidiagonalization, although a more stringent 
tolerance may be required in practice. Table I5~41 collects the results for all test cases. 

As expected, the power method is more expensive than the Lanczos procedure, 
on average four to five times more expensive, in all those cases when the first singular 
value is not well separated from the second one. Only for the cases of good separation, 
for instance with A$ and the functions (exp(- v /x) — l)/x and 1 /y/x, convergence is 
reached in very few iterations, and the power method is competitive. 

We also implemented the power method as described in pjj, Algorithm 3.19], 
using the relative singular value residual as stopping criterion. The performance, both 
in terms of inner and outer number of iterations, is comparable to that of Table [HTTl 
Finally, we stress that in both implementations the stopping criterion involves inexact 
matrix-vector products, therefore the monitored quantity is not the true residual of 
the corresponding problem. 

8.3. Numerical tests with the extended Krylov subspace. If high accu¬ 
racy is required for the final approximation, so that a more stringent outer tolerance is 
used, then the inner iteration also requires more computational effort, as its stopping 
tolerance is also decreased. In this case, it may be appropriate to use more effective 
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matr 

function 

°1 

£Tl—CT2 

5-1 

tot # 

outer 

tot # 

inner 

average 

# inner 

exec 

time 

A\ 

exp(— x) 

0.463735 

2.04e-02 

24 

480 

10.0 

3.49 


sfx 

1.50496 

8.76e-04 

53 

954 

9.0 

8.24 


exp( — y/x) — 1 

0.728200 

7.62e-03 

29 

522 

9.0 

4.29 


exp(ic) 

9.19576 

9.22e-03 

32 

704 

11.0 

5.82 


1 

\/x 

1.10504 

5.52e-03 

29 

522 

9.0 

4.59 

A2 

exp(— x) 

0.223129 

4.88e-05 

209 

5434 

13.0 

36.71 


yjx 

1.79651 

5.18e-05 

162 

3564 

11.0 

20.03 


exp( — y/x) — 1 

0.470776 

3.85e-05 

193 

4246 

11.0 

29.99 


exp(rr) 

12.1825 

8.39e-04 

47 

1408 

15.0 

5.07 


1 

\/x 

0.816492 

5.90e-05 

150 

3300 

11.0 

18.70 

A 3 

exp (—a:) 

0.509010 

4.43e-05 

224 

11827 

26.4 

106.31 


sfx 

4.57175 

3.35e-05 

250 

9402 

18.8 

84.26 


exp( — y/x) — 1 

0.616989 

1.20e-04 

155 

5578 

18.0 

40.02 


exp(rr) 

6.77296-10 s 

1.17e-04 

183 

11169 

33 

112.76 


1 

yfx 

0.960790 

2.12e-05 

312 

11449 

18.3 

125.20 

a 4 

exp(— x) 

0.000172195 

2.32e-01 

9 

376 

20.9 

4.20 


sfx 

6.09289 

2.38e-02 

22 

483 

11.0 

5.99 


exp( — y/x) — 1 

0.118347 

2.44e-02 

16 

318 

9.9 

4.32 


exp(rr) 

3.15148-10 10 

1.34e-01 

7 

527 

37.6 

4.66 


1 

yfx 

0.354473 

1.18e-02 

19 

416 

10.9 

4.88 

A s 

exp(— x) 

0.998062 

7.74e-03 

24 

887 

18.5 

11.95 


sfx 

2.82811 

1.67e-04 

185 

8165 

22.1 

121.99 


exp( — y/x) — 1 

6.93435 

2.32e-01 

7 

294 

21.0 

5.01 


exp(fc) 

2975.18 

2.91e-03 

55 

2090 

19.0 

25.19 


1 

yfx 

7.36768 

2.17e-01 

7 

294 

21.0 

4.32 


Table 8.5: Inexact Lanczos bidiagonalization, outer tolerance e = 10 4 , inner approx¬ 
imation: extended Krylov subspace method. 


methods. One such possibility is the extended Krylov subspace method mm, which 
may be convenient in case the considered function requires good approximation of the 
eigenvalues of A closest to the origin. In Table [831 we report the runs for £ out = 10 -4 
when EKSM is used; these numbers should be compared with those in Table I8~51 We 
notice that EKSM requires the solution of a system with A (or A*) at each iteration; 
to limit computational costs, a sparse LU factorization of A was performed and stored 
once for all at the beginning of the Lanczos bidiagonalization, and used repeatedly in 
the inner iteration. This represents a tremendous saving with respect to more general 
rational approximations, where solves with (A — Tjl ) have to be performed at each 
inner iteration, with t 3 varying with the inner step. 

In Table 18.51 all cases where EKSM provides faster convergence, that is lower 
execution time, are marked in boldface. It is clear that EKSM is beneficial when good 
approximations to both ends of the spectrum are required, as is the case for x a . The 
lack of improvement in the case of the exponential is not unexpected, as it is known, 
at least in the Hermitian case, that only one extreme of the spectrum needs to be 
captured for a fast approximation of exp(H)v. 

We also remark that EKSM could also be employed as inner method in the case 
of the power iteration used in section 18.21 

8.4. Numerical tests with variable accuracy. In the previous sections, for 
£ out — 10~ 4 the inner tolerance was set to the fixed value £i n = 10 -7 . Here we explore 
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the performance of the inexact computation when the inner tolerance is relaxed. 

A relaxed inner accuracy is most convenient when each inner iteration is expen¬ 
sive, so as to profit from a lower number of inner iterations. Therefore, we report on 
our experience with the extended Krylov subspace as inner method, as the method re¬ 
quires one system solve with the coefficient matrix at each iteration. A more stringent 
outer tolerance was used, that is e ou t = 10 , than in previous experiments, so as to 

clearly see the relaxation in the inner tolerance; we also used m max = 50 as maximum 
number of iterations, so as to balance the much smaller e ou t for determining the initial 
inner tolerance. 

Figure [5721 shows the performance of the relaxation strategy for A$ and f(x) = 
1 / yjx. The plot shows the outer convergence history as the bidiagonalization proceeds, 
and the corresponding variable inner tolerance. The digits next to each iteration 
report the actual numbers of inner iterations by means of EKSM to reach the required 
inner accuracy for approximating /(A)vfc; similar numbers were observed for f(A)*Uk- 



Fig. 8.2: Relaxed inner iteration for variable stopping tolerance, to approximate 
||A«r 1/2 ||, with e ou t = 1CT 7 . 


We also experimented with the approximation of more than one triplet. We 
report on our findings for A\ and again f(x) = 1/ \fx (similar accuracies were obtained 
for other functions for the same matrix); to explore the variable inner accuracy we 
used £ 0 ut = 1CT 9 and m m ax = 100. Table 18.61 shows the largest ten singular values 
obtained with a fixed inner tolerance of 10” 11 (ay, second column), and with a relaxed 
inner tolerance (a^ l \ third column), which a-posteriori we observed to go from 10” 11 

up to 10” 5 . The last column reports the relative error \dj — In both cases, 

the inexact Lanczos iteration was stopped as soon as the outer stopping criterion 
was satisfied for the largest singular value. While in the fixed inner tolerance case 
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j 



\dj - ay l> \/aj 

1 

1.117020718026223 

1.117020718026212 

9.939144428645173e-15 

2 

1.107884805324699 

1.107884805324724 

2.264769787972212e-14 

3 

1.098394607515649 

1.098394607513931 

1.564063676705998e-12 

4 

1.095557563655289 

1.095557550135225 

1.234080654983002e-08 

5 

1.087939266226247 

1.087939266157844 

6.287397018996839e-ll 

6 

1.081739455193175 

1.081739454786304 

3.761265981356762e-10 

7 

1.077326677541678 

1.077326677826174 

2.640758724893079e-10 

8 

1.070641401649297 

1.070641400153385 

1.397211109700097e-09 

9 

1.064637797334345 

1.064637795718615 

1.517633507177347e-09 

10 

1.055679471834666 

1.055679470916211 

8.700132135409333e-10 


_i/2 

Table 8 . 6 : First ten approximate singular values of A x ' with fixed tolerance ( e ou t = 
10 -9 ), and relaxed inner tolerance. 


the number of iterations varied between 28 and 30, in the flexible case a number 
of iterations as low as 15 was needed to satisfy the inner criterion at the last stage 
of the convergence process. After exiting the flexible procedure, however, the first 
ten approximate singular values are very close to those obtained with the fixed inner 
tolerance, much closer than warranted by the final inner accuracy of 10~ 5 . This 
shows in particular that the flexible inner tolerance is conservative, and more accurate 
approximations are usually expected. We refer the reader to m for a more detailed 
analysis of the approximation of more than one eigenpair (and thus more than one 
triplet in our context). 

9. Final considerations. We have explored the use of an inexact Lanczos bidi- 
agonalization method for approximating the leading singular triplet of a large ma¬ 
trix function, and in particular its 2-norm. Although several strategies are known 
to provide rough estimates of a matrix function 2 -norm, more accurate approxima¬ 
tions require a careful implementation of available approaches, since neither f(A) nor 
products of the type /(A)v are available exactly. In particular, we showed that the 
Lanczos bidiagonalization yields a non-Hermitian perturbation of the original Her- 
mitian matrix, and the recurrence needs to be revisited. Our numerical experiments 
showed that the computational complexity may vary significantly depending on the 
requested final accuracy, since the two inner iterations for the approximation of f{A)v 
and f(A)*u may be very time and memory consuming. We showed that the relaxed 
strategy alleviates this problem whenever accurate approximations are required. How¬ 
ever, for particular selections of matrices and functions, the approximation of /(A)v 
can still be very expensive, and some other strategies could be exploited, such as 
restarting; see, e.g., Em,nn,nz] and references therein. Finally, our approach could 
be used to estimate the norm of other matrix objects, such as the geometric mean 
[3], or the derivatives of matrix functions, such as the Frechet derivative of the matrix 
exponential or of other functions [111- 
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Appendix A. Proof of Proposition 16.11 Define the submatrix of B 2m of 
size 2 k as 


B 2k = 


0 M k 
T k 0 


i.e. 


B 2 m — 


0 0 M k M* 

0 0 0 * 

T k T* 0 0 

_£fc+i,fc e i e fc * 0 0 


Define the vector q = -^[x; 0; y; 0], where the 0-vectors have length m — k. Let 
X = [q. Y] be such that X is unitary, where Y = [Yi :Y 2 :Y i ; Y 4 ] . This implies that 
^(Yi*x + F 3 *y) = 0, Y 4 Y 4 * =1 = Y 2 Y 2 * and Y 2 Y 4 * = Y 4 Y£ = 0. Now, write 


X*B 2m X = 


q* H 2m q 

q *B 2m Y 


-g(2k) 

g* ' 

Y*B 2m q 

Y*B 2m Y 


. §2 

—2m. 


Here 


|g 2 || = II Y*B, 


'2mq — 


“tT 


M k y 

0 

TfcX 

t k +i,k^i^ k ^ 


II = ||^T 4 *t fc+ i, fc eiefcx|| = || r 2 fc || - 


Further, since s 2m Y = q*B 2m Y — *Y = q*B 2m Y, we have 
||gi|| = ||q^ 2m F|| = ||s* m F||<||s 2m ||. 


Now, by [33, Theorem 2.1, p.230], 


if 


ll r 2fc|| ||S2m|| 

s 2 

u 2m,2k 



i.e., 


11*2*11 


A 2 

u 2m,2k 

4M’ 


then there exists a vector p £ C 2m 1 satisfying r = ||p|| < 2 , such that the unit 

norm vector 


Xl 


( 

X 


pul 

\ 

x 2 

1 

1 

0 

4- 

y 2 


yi 

V 1 + IIpII 2 

V2 

y 


Y :i 


A 2 . 

\ 

0 


[Yi\ 

/ 


is an eigenvector of B 2m . Moreover, 


[ x 2 


1 

rui 


[y 2 _ 


v/l + T 2 

Y 4 

P 


r 

Vl +T 2 


Further, this same theorem states that 1 6 — $( 2fe )| = ||g*p|| < ||gi||||p|| < ||s 2m ||r. □ 









































